Solution Engineering - R
May 15, 2023
Ziel des Projekts ist es, ein Modell zur Vorhersage von betrügerischen Kontotransaktionen zu entwickeln.
Um unser Hauptziel zu erreichen, müssen wir uns auf verschiedene qualitative und quantitative Ziele konzentrieren.
Um unsere Hauptziel zu erreichen, sind folgende qualitative Ziele notwendig:
Um unsere Hauptziel zu erreichen, sind folgende quantitative Ziele notwendig (gemessen an Testdaten):
Folgende Features sind in den Daten vorhanden:
step - maps a unit of time in the real world. In this case 1 step is 1 hour of time. Total steps 744 (30 days simulation).
type - CASH-IN, CASH-OUT, DEBIT, PAYMENT and TRANSFER.
amount - amount of the transaction in local currency.
nameOrig - customer who started the transaction
oldbalanceOrg - initial balance before the transaction
newbalanceOrig - new balance after the transaction
nameDest - customer who is the recipient of the transaction
oldbalanceDest - initial balance recipient before the transaction. Note that there is not information for customers that start with M (Merchants).
newbalanceDest - new balance recipient after the transaction. Note that there is not information for customers that start with M (Merchants).
isFraud - This is the transactions made by the fraudulent agents inside the simulation. In this specific dataset the fraudulent behavior of the agents aims to profit by taking control or customers accounts and try to empty the funds by transferring to another account and then cashing out of the system.
isFlaggedFraud - The business model aims to control massive transfers from one account to another and flags illegal attempts. An illegal attempt in this dataset is an attempt to transfer more than 200.000 in a single transaction.
#read csv data/Fraud.csv into tibble with readr::read_csv for better performance & tidyverse support
fraud <- readr::read_csv ("data/Fraud.csv", )
fraud %>% glimpse()Rows: 6,362,620
Columns: 11
$ step <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ type <chr> "PAYMENT", "PAYMENT", "TRANSFER", "CASH_OUT", "PAYMENT"…
$ amount <dbl> 9839.64, 1864.28, 181.00, 181.00, 11668.14, 7817.71, 71…
$ nameOrig <chr> "C1231006815", "C1666544295", "C1305486145", "C84008367…
$ oldbalanceOrg <dbl> 170136.0, 21249.0, 181.0, 181.0, 41554.0, 53860.0, 1831…
$ newbalanceOrig <dbl> 160296.36, 19384.72, 0.00, 0.00, 29885.86, 46042.29, 17…
$ nameDest <chr> "M1979787155", "M2044282225", "C553264065", "C38997010"…
$ oldbalanceDest <dbl> 0, 0, 0, 21182, 0, 0, 0, 0, 0, 41898, 10845, 0, 0, 0, 0…
$ newbalanceDest <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 4…
$ isFraud <dbl> 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ isFlaggedFraud <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
[,1]
rows 6362620
columns 11
discrete_columns 3
continuous_columns 8
all_missing_columns 0
total_missing_values 0
complete_rows 6362620
total_observations 69988820
memory_usage 1140722824
step type amount nameOrig
Min. : 1.0 Length:6362620 Min. : 0 Length:6362620
1st Qu.:156.0 Class :character 1st Qu.: 13390 Class :character
Median :239.0 Mode :character Median : 74872 Mode :character
Mean :243.4 Mean : 179862
3rd Qu.:335.0 3rd Qu.: 208721
Max. :743.0 Max. :92445517
oldbalanceOrg newbalanceOrig nameDest oldbalanceDest
Min. : 0 Min. : 0 Length:6362620 Min. : 0
1st Qu.: 0 1st Qu.: 0 Class :character 1st Qu.: 0
Median : 14208 Median : 0 Mode :character Median : 132706
Mean : 833883 Mean : 855114 Mean : 1100702
3rd Qu.: 107315 3rd Qu.: 144258 3rd Qu.: 943037
Max. :59585040 Max. :49585040 Max. :356015889
newbalanceDest isFraud isFlaggedFraud
Min. : 0 Min. :0.000000 Min. :0.0e+00
1st Qu.: 0 1st Qu.:0.000000 1st Qu.:0.0e+00
Median : 214661 Median :0.000000 Median :0.0e+00
Mean : 1224996 Mean :0.001291 Mean :2.5e-06
3rd Qu.: 1111909 3rd Qu.:0.000000 3rd Qu.:0.0e+00
Max. :356179279 Max. :1.000000 Max. :1.0e+00
# A tibble: 6 × 11
step type amount nameOrig oldba…¹ newba…² nameD…³ oldba…⁴ newba…⁵ isFraud
<dbl> <chr> <dbl> <chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 1 PAYMENT 9840. C123100… 170136 160296. M19797… 0 0 0
2 1 PAYMENT 1864. C166654… 21249 19385. M20442… 0 0 0
3 1 TRANSFER 181 C130548… 181 0 C55326… 0 0 1
4 1 CASH_OUT 181 C840083… 181 0 C38997… 21182 0 1
5 1 PAYMENT 11668. C204853… 41554 29886. M12307… 0 0 0
6 1 PAYMENT 7818. C900456… 53860 46042. M57348… 0 0 0
# … with 1 more variable: isFlaggedFraud <dbl>, and abbreviated variable names
# ¹oldbalanceOrg, ²newbalanceOrig, ³nameDest, ⁴oldbalanceDest,
# ⁵newbalanceDest
# A tibble: 6 × 11
step type amount nameO…¹ oldba…² newba…³ nameD…⁴ oldba…⁵ newba…⁶ isFraud
<dbl> <chr> <dbl> <chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 743 TRANSFER 3.40e5 C20139… 3.40e5 0 C18504… 0 0 1
2 743 CASH_OUT 3.40e5 C78648… 3.40e5 0 C77691… 0 3.40e5 1
3 743 TRANSFER 6.31e6 C15290… 6.31e6 0 C18818… 0 0 1
4 743 CASH_OUT 6.31e6 C11629… 6.31e6 0 C13651… 6.85e4 6.38e6 1
5 743 TRANSFER 8.50e5 C16859… 8.50e5 0 C20803… 0 0 1
6 743 CASH_OUT 8.50e5 C12803… 8.50e5 0 C87322… 6.51e6 7.36e6 1
# … with 1 more variable: isFlaggedFraud <dbl>, and abbreviated variable names
# ¹nameOrig, ²oldbalanceOrg, ³newbalanceOrig, ⁴nameDest, ⁵oldbalanceDest,
# ⁶newbalanceDest
Die folgende Grafik veranschaulicht die zuvorgebrachte Statistik.
| isFlaggedFraud | isFraud | n |
|---|---|---|
| 0 | 0 | 6354407 |
| 0 | 1 | 8197 |
| 1 | 1 | 16 |
Link zu unserem Repository:
https://git-inf.technikum-wien.at/soe-r/soe-2023-b/fraudulent-transaction-classification/fraud/-/milestones
Backlog
Board
Meilensteinplan
Deliverables